An improved centroid classifier for text categorization
نویسنده
چکیده
In the context of text categorization, Centroid Classifier has proved to be a simple and yet efficient method. However, it often suffers from the inductive bias or model misfit incurred by its assumption. In order to address this issue, we propose a novel batch-updated approach to enhance the performance of Centroid Classifier. The main idea behind this method is to take advantage of training errors to successively update the classification model by batch. The technique is simple to implement and flexible to text data. The experimental results indicate that the technique can significantly improve the performance of Centroid Classifier. 2007 Elsevier Ltd. All rights reserved.
منابع مشابه
An Effective Approach to Enhance Centroid Classifier for Text Categorization
Centroid Classifier has been shown to be a simple and yet effective method for text categorization. However, it is often plagued with model misfit (or inductive bias) incurred by its assumption. To address this issue, a novel Model Adjustment algorithm was proposed. The basic idea is to make use of some criteria to adjust Centroid Classifier model. In this work, the criteria include training-se...
متن کاملTowards enhancing centroid classifier for text classification - A border-instance approach
Text classification/categorization (TC) is to assign new unlabeled natural language documents to the predefined thematic categories. Centroid-based classifier (CC) has been widely used for TC because of its simplicity and efficiency. However, it has also been long criticized for its relatively low classification accuracy compared with state-of-the-art classifiers such as support vector machines...
متن کاملImproving kNN Text Categorization by Removing Outliers from Training Set
We show that excluding outliers from the training data significantly improves kNN classifier, which in this case performs about 10% better than the best know method—Centroid-based classifier. Outliers are the elements whose similarity to the centroid of the corresponding category is below a threshold.
متن کاملWeight Adjustment Schemes for a Centroid Based Classifier Weight Adjustment Schemes for a Centroid Based Classifier Weight Adjustment Schemes for a Centroid Based Classifier *
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intra-nets. Automatic text categorization, which is the task of assigning text documents to pre-specified classes (topics or themes) of documents, is an important task that can help both in organizing as well as in finding information on t...
متن کاملA Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Document Categorization
Assigning documents to related categories is critical task which is used for effective document retrieval. Automatic text classification is the process of assigning new text document to the predefined categories based on its content. In this paper, we implemented and performed comparison of Naïve Bayes and Centroid-based algorithms for effective document categorization of English language text....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Expert Syst. Appl.
دوره 35 شماره
صفحات -
تاریخ انتشار 2008